This title is a bit provocative, yet strictly speaking it is correct. We will show two examples where we realise what is missing from GAMMs in order to become tools for time series analysis. In short, we need to communicate to the model something about the order of the time samples, which we have not done so far.
Consider the data below:
Let’s estimate a GAM:
m1 <- bam(y ~ s(time, k = 20), data = curves)
summary(m1)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(time, k = 20)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1795848 0.0004607 389.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time) 15.52 17.6 6060 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.915 Deviance explained = 91.5%
## fREML = -16489 Scale est. = 0.0021116 n = 9950
plot_smooth(m1, view = "time", col = "slateblue4", print.summary = FALSE, rug = FALSE)
Now, imagine that the same data set was not a collection of time
series. For example instead of time we could have
price of houses (in millions of euros), and y
could be some measure of demand. The plot would be like this:
Question: How would you change the GAM above to reflect the fact that there are no curves but it is all one data set?
gam.check(m1)
##
## Method: fREML Optimizer: perf newton
## full convergence after 9 iterations.
## Gradient range [-6.31743e-06,6.479967e-06]
## (score -16488.76 & scale 0.002111591).
## Hessian positive definite, eigenvalue range [8.040793,4974.011].
## Model rank = 20 / 20
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(time) 19.0 15.5 1.02 0.89
We note a strong evidence for heteroscedasticity. Where does it come from?
Look at the variation in the large peak. When y is high
we are around the peak, and that is also where the largest variation
across curves occurs, hence the wider residual span for higher fitted
values.
Now let’s consider one particular curve, say one whose first peak is
higher than the mean. As we see from the raw curves, we expect that if
the residual error for this particular curve at say
time = 0.4 is positive, it will also be positive in the
next time samples. Hence neighbouring residuals are
correlated, which violates one of the hypotheses of the model.
We would like to express this fact in the GAMM.
A way to alleviate the problem above is to induce a structure in the residuals as follows: \[ \epsilon_i = \rho \cdot \epsilon_{i-1} + \psi_i \] which means that the residual at point \(i\) is proportional to the residual at point \(i-1\), i.e. one step earlier on the time axis, plus another unknown term \(\psi_i\), the latter values distributed as \(N(0, \sigma^2)\) and independent. The GAMM will not estimate the parameter \(\rho\) for us directly, rather we meed to set it ourselves. This type of sub-model for the noise is well known in signal processing and it’s called AR1, i.e. auto regressive model of order 1 (because it only depends on one step in the past).
What we typically do is this:
The following estimates \(\rho\) and plots the complete autocorrelation function for the residuals:
m1.acf <- acf_resid(m1)
This function estimates the correlation of residuals with themselves in the past at different lags. We take the value as lag 1 as our estimate for \(\rho\)
rho <- m1.acf[2]
# at index 1 we have lag 0, whose value is 1 by definition
# at index 2 we have lag 1, which is what we need
rho
## 1
## 0.8045908
Now we re-run the GAMM. We set \(\rho\) by specifying the argument
rho. We also need to indicate where the start of curves in
the data are, otherwise the AR1 model will be also applied between the
last and the first sample of curves (argument
AR.start).
m1.ar1 <- bam(y ~ s(time, k = 20), data = curves,
rho = rho, AR.start = curves$time == 0
)
summary(m1.ar1)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(time, k = 20)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.179576 0.001376 130.5 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time) 12.43 15.13 804.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.915 Deviance explained = 91.5%
## fREML = -21706 Scale est. = 0.0020926 n = 9950
plot_smooth(m1.ar1, view = "time", col = "slateblue4", rug = FALSE, print.summary = FALSE)
acf_resid(m1.ar1)
gam.check(m1.ar1)
##
## Method: fREML Optimizer: perf newton
## full convergence after 7 iterations.
## Gradient range [-1.51787e-05,1.483733e-05]
## (score -21705.86 & scale 0.002092649).
## Hessian positive definite, eigenvalue range [6.899551,4974.007].
## Model rank = 20 / 20
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(time) 19.0 12.4 1.01 0.86
The results show that:
The way to explicitly represent the information about curves, i.e. which sample belongs to which curve, is to introduce a random smooth factor at the curve level:
m1.randCurves <- bam(y ~ s(time, k = 20) +
+ s(time, curveId, bs = "fs", m=1, k = 15)
, nthreads = 2
, data = curves)
## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has repeated
## 1-d smooths of same variable.
summary(m1.randCurves)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ s(time, k = 20) + +s(time, curveId, bs = "fs", m = 1, k = 15)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.179574 0.004535 39.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(time) 17.69 18.6 671.54 <2e-16 ***
## s(time,curveId) 505.63 749.0 56.77 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.984 Deviance explained = 98.5%
## fREML = -24077 Scale est. = 0.00039985 n = 9950
plot_smooth(m1.randCurves, view = "time", col = "slateblue4", rug = FALSE, print.summary = FALSE)
op <- par(mfrow=c(2,2))
gam.check(m1.randCurves)
##
## Method: fREML Optimizer: perf newton
## full convergence after 7 iterations.
## Gradient range [-1.198123e-07,1.173967e-07]
## (score -24076.94 & scale 0.0003998499).
## Hessian positive definite, eigenvalue range [8.57444,4985.08].
## Model rank = 770 / 770
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(time) 19.0 17.7 1.02 0.84
## s(time,curveId) 750.0 505.6 1.02 0.87
acf_resid(m1.randCurves)
par(op)
The results show that:
As alternative to the mgcv library, you can try the sparseFLMM
library, which incorporates curve-level random smooths by default and it
performs an efficient estimation based on functional PCA.
sparseFLMM is (in my opinion) generally less user-friendly
than mgcv.
This is a curve dataset with a 4-level factor F4. An AR1
residual with coefficient \(\rho =\)
0.9 is added to the 4 expected curves.
Let us first model it as:
y ~ F4 + s(t, by = F4), i.e. a
regular GAM with one factor and no AR1 residual.
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ F4 + s(t, by = F4, k = 10)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.499089 0.001344 371.276 <2e-16 ***
## F4A.2 0.482697 0.001901 253.909 <2e-16 ***
## F4B.1 -0.003519 0.001901 -1.851 0.0643 .
## F4B.2 0.456722 0.001901 240.246 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(t):F4A.1 8.622 8.959 13909 <2e-16 ***
## s(t):F4A.2 8.819 8.990 15343 <2e-16 ***
## s(t):F4B.1 8.721 8.977 14070 <2e-16 ***
## s(t):F4B.2 8.910 8.998 7570 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.993 Deviance explained = 99.3%
## fREML = -6933.4 Scale est. = 0.0018432 n = 4080
##
## Method: fREML Optimizer: perf newton
## full convergence after 8 iterations.
## Gradient range [-2.118151e-05,2.040069e-05]
## (score -6933.404 & scale 0.001843157).
## Hessian positive definite, eigenvalue range [3.696508,2036.03].
## Model rank = 40 / 40
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(t):F4A.1 9.00 8.62 1.03 0.99
## s(t):F4A.2 9.00 8.82 1.03 0.99
## s(t):F4B.1 9.00 8.72 1.03 0.99
## s(t):F4B.2 9.00 8.91 1.03 0.99
Notice that:
Let us add the AR1 term to the model, taking the estimated value \(\hat{\rho} =\) 0.878893:
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ F4 + s(t, by = F4, k = 10)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.498847 0.004664 106.953 <2e-16 ***
## F4A.2 0.483057 0.006596 73.231 <2e-16 ***
## F4B.1 -0.003197 0.006596 -0.485 0.628
## F4B.2 0.457320 0.006596 69.328 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(t):F4A.1 8.618 8.968 1815 <2e-16 ***
## s(t):F4A.2 8.816 8.992 2066 <2e-16 ***
## s(t):F4B.1 8.722 8.983 1837 <2e-16 ***
## s(t):F4B.2 8.912 8.998 1624 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.993 Deviance explained = 99.3%
## fREML = -10094 Scale est. = 0.0016863 n = 4080
##
## Method: fREML Optimizer: perf newton
## full convergence after 6 iterations.
## Gradient range [-3.567584e-05,3.518998e-05]
## (score -10093.55 & scale 0.001686262).
## Hessian positive definite, eigenvalue range [3.762987,2036.03].
## Model rank = 40 / 40
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(t):F4A.1 9.00 8.62 1.03 0.99
## s(t):F4A.2 9.00 8.82 1.03 0.99
## s(t):F4B.1 9.00 8.72 1.03 0.98
## s(t):F4B.2 9.00 8.91 1.03 0.98
Notice that:
Let us now change two things:
A and BLet us model this dataset as: y ~ F2 + s(t, by = F2)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ F2 + s(t, by = F2, k = 10)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.744812 0.006513 114.353 <2e-16 ***
## F2B -0.017690 0.009211 -1.921 0.0549 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(t):F2A 5.509 6.652 754.4 <2e-16 ***
## s(t):F2B 6.604 7.734 432.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.672 Deviance explained = 67.3%
## fREML = 826.23 Scale est. = 0.086543 n = 4080
##
## Method: fREML Optimizer: perf newton
## full convergence after 8 iterations.
## Gradient range [-5.650425e-07,4.193194e-07]
## (score 826.2282 & scale 0.08654253).
## Hessian positive definite, eigenvalue range [2.159394,2038.006].
## Model rank = 20 / 20
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(t):F2A 9.00 5.51 0.1 <2e-16 ***
## s(t):F2B 9.00 6.60 0.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice that:
Let us blindly apply the AR1 recipe, i.e. take the estimated \(\hat{\rho} =\) 0.9912984 and re-fit the model:
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ F2 + s(t, by = F2, k = 10)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.74466 0.03844 19.372 <2e-16 ***
## F2B -0.01751 0.05436 -0.322 0.747
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(t):F2A 8.186 8.868 194 <2e-16 ***
## s(t):F2B 8.658 8.976 254 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.672 Deviance explained = 67.3%
## fREML = -7762.9 Scale est. = 0.06822 n = 4080
##
## Method: fREML Optimizer: perf newton
## full convergence after 8 iterations.
## Gradient range [-3.599553e-09,3.34515e-09]
## (score -7762.912 & scale 0.06821957).
## Hessian positive definite, eigenvalue range [3.542336,2038.014].
## Model rank = 20 / 20
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(t):F2A 9.00 8.19 0.1 <2e-16 ***
## s(t):F2B 9.00 8.66 0.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice that:
Let us instead add a curve-specific random smooth term (and remove the AR1 term):
##
## Family: gaussian
## Link function: identity
##
## Formula:
## y ~ F2 + s(t, by = F2, k = 10) + s(t, curveId, by = F2, bs = "fs",
## m = 1)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.74481 0.03880 19.20 <2e-16 ***
## F2B -0.01769 0.05353 -0.33 0.741
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(t):F2A 8.064 8.174 72.45 <2e-16 ***
## s(t):F2B 8.590 8.643 112.88 <2e-16 ***
## s(t,curveId):F2A 349.018 359.000 1248.35 <2e-16 ***
## s(t,curveId):F2B 348.065 359.000 1237.87 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.999 Deviance explained = 99.9%
## fREML = -8261 Scale est. = 0.00039225 n = 4080
##
## Method: fREML Optimizer: perf newton
## full convergence after 17 iterations.
## Gradient range [-0.0001062954,8.748938e-05]
## (score -8260.963 & scale 0.0003922519).
## Hessian positive definite, eigenvalue range [3.40806,2063.484].
## Model rank = 1460 / 1460
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(t):F2A 9.00 8.06 1 0.59
## s(t):F2B 9.00 8.59 1 0.56
## s(t,curveId):F2A 720.00 349.02 1 0.58
## s(t,curveId):F2B 720.00 348.07 1 0.56
Notice that: